Generative models have been widely studied in computer vision. Recently, diffusion models have drawn substantial attention due to the high quality of their generated images. A key desired property of image generative models is the ability to disentangle different attributes, which should enable modification towards a style without changing the semantic content, and the modification parameters should generalize to different images. Previous studies have found that generative adversarial networks (GANs) are inherently endowed with such disentanglement capability, so they can perform disentangled image editing without re-training or fine-tuning the network. In this work, we explore whether diffusion models are also inherently equipped with such a capability. Our finding is that for stable diffusion models, by partially changing the input text embedding from a neutral description (e.g., "a photo of person") to one with style (e.g., "a photo of person with smile") while fixing all the Gaussian random noises introduced during the denoising process, the generated images can be modified towards the target style without changing the semantic content. Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. This entire process only involves optimizing over around 50 parameters and does not fine-tune the diffusion model itself. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms that require fine-tuning. The optimized weights generalize well to different images. Our code is publicly available at https://github.com/UCSB-NLP-Chang/DiffusionDisentanglement.
translated by 谷歌翻译
知识图(kg)嵌入是一种主流方法,用于推理不完整的kg。但是,受其固有浅层和静态体系结构的限制,它们几乎无法处理对复杂逻辑查询的不断上升,这些查询包括逻辑运算符,估算的边缘,多个源实体和未知的中间实体。在这项工作中,我们通过掩盖的预训练和微调策略介绍了知识图变压器(kgtransformer)。我们设计了一种kg三重变换方法,以使变压器能够处理kg,这是通过稀疏(MOE)稀疏激活的混合物进一步增强的。然后,我们将复杂的逻辑查询作为掩盖预测提出,并引入了两阶段掩盖的预训练策略,以提高可转移性和概括性。在两个基准上进行的广泛实验表明,KGTRANSFORMER可以始终超过基于KG的基准和九个内域和室外推理任务的高级编码。此外,KGTRANSFORMER可以通过提供解释给定答案的完整推理路径来解释性。
translated by 谷歌翻译
在目标属性下设计和生成新数据一直吸引着各种关键应用,例如分子设计,图像编辑和语音合成。传统手工制作的方法在很大程度上依赖于专业知识经验和强化人类的努力,但仍遭受科学知识和低吞吐量的不足,无法支持有效,有效的数据生成。最近,深度学习的进步引起了可以学习数据的基本表示和属性的表达方法。这种能力为弄清数据的结构模式和功能特性之间的相互关系提供了新的机会,并利用这种关系以生成所需属性的结构数据。本文对这个有前途的研究领域进行了系统的综述,通常称为可控制的深度数据生成。首先,提出了潜在的挑战,并提供了初步的挑战。然后,正式定义了可控的深度数据生成,提出了各种技术的分类法,并总结了该特定领域中的评估指标。之后,引入了可控制的深度数据生成的令人兴奋的应用程序,并对现有的作品进行了实验分析和比较。最后,突出显示了可控制的深度数据生成的有希望的未来方向,并确定了五个潜在的挑战。
translated by 谷歌翻译
构建强大的通用对象检测框架需要扩展到更大的标签空间和更大的培训数据集。但是,大规模获取数千个类别的注释是高昂的成本。我们提出了一种新颖的方法,该方法利用了最近的视觉和语言模型中可用的丰富语义来将对象定位和分类在未标记的图像中,从而有效地生成了伪标签以进行对象检测。从通用和类别的区域建议机制开始,我们使用视觉和语言模型将图像的每个区域分类为下游任务所需的任何对象类别。我们在两个特定的任务(开放式摄影检测检测)中演示了生成的伪标签的值,其中模型需要概括为看不见的对象类别以及半监督对象检测,其中可以使用其他未标记的图像来改善模型。我们的经验评估显示了伪标签在这两个任务中的有效性,我们在其中优于竞争基准并实现了开放式摄制对象检测的新颖最新。我们的代码可在https://github.com/xiaofeng94/vl-plm上找到。
translated by 谷歌翻译
该技术报告提出了一种有效的自动驾驶运动预测方法。我们开发了一种基于变压器的方法,用于输入编码和轨迹预测。此外,我们提出了时间流动头来增强轨迹编码。最后,使用了有效的K均值集合方法。使用我们的变压器网络和集合方法,我们以1.90的最新Brier-Minfde得分赢得了Argoverse 2 Motion预测挑战的第一名。
translated by 谷歌翻译
动机:多型药物 - 药物相互作用(DDI)的计算预测有助于减少多药物治疗中的意外副作用。虽然现有的计算方法实现了鼓舞人心的结果,但它们忽略了药物的作用主要是由其化学结构引起的。此外,他们的可解释性仍然很弱。结果:在本文中,假设两个给定药物之间的相互作用是由其本地化学结构(子结构)引起的,并且它们的DDI类型由不同的子结构组之间的连接确定,我们设计了一个新的子结构 - 浪费张力神经网络DDI预测网络模型(STNN-DDI)。所提出的模型学习了(子结构,替换类型,子结构)三元组的3-D张量,其表征了子结构 - 子结构相互作用(SSI)空间。根据具有特定化学意义的预定义子结构的列表,药物中的药物映射到该SSI空间使得STNN-DDI能够以可解析的方式以统一的形式在转换和电感方案中执行多型DDI预测。基于深度学习的最先进的基线的比较是STNN-DDI的优越性,具有AUC,AUPR,精度和精度的显着提高。更重要的是,案例研究通过揭示其对特定DDI中的DDI类型的兴趣和揭示交互类型特定的子结构对的药物的关注子结构对的解释性。总之,STNN-DDI提供了预测DDIS的有效方法,以及解释药物之间的相互作用机制。
translated by 谷歌翻译
本文介绍了一个基于立体图像的视觉伺服系统,用于通过非全面机器人的轨迹跟踪,而没有外部派生的姿势信息或已知的环境可视地图。它称为轨迹宣传片。关键组件是一种基于功能的间接同时定位和映射(SLAM)方法,可提供具有估计深度的可用功能池,因此可以及时向前传播它们以生成图像特征轨迹进行视觉伺服。短距离和长距离实验显示了轨迹伺服的好处,可以导航未知区域而没有绝对定位。从经验上讲,当两者都依靠相同的基础大满贯系统时,轨迹宣传片比基于姿势的反馈具有更好的轨迹跟踪性能。
translated by 谷歌翻译
Multivariate time series forecasting with hierarchical structure is pervasive in real-world applications, demanding not only predicting each level of the hierarchy, but also reconciling all forecasts to ensure coherency, i.e., the forecasts should satisfy the hierarchical aggregation constraints. Moreover, the disparities of statistical characteristics between levels can be huge, worsened by non-Gaussian distributions and non-linear correlations. To this extent, we propose a novel end-to-end hierarchical time series forecasting model, based on conditioned normalizing flow-based autoregressive transformer reconciliation, to represent complex data distribution while simultaneously reconciling the forecasts to ensure coherency. Unlike other state-of-the-art methods, we achieve the forecasting and reconciliation simultaneously without requiring any explicit post-processing step. In addition, by harnessing the power of deep model, we do not rely on any assumption such as unbiased estimates or Gaussian distribution. Our evaluation experiments are conducted on four real-world hierarchical datasets from different industrial domains (three public ones and a dataset from the application servers of Alipay's data center) and the preliminary results demonstrate efficacy of our proposed method.
translated by 谷歌翻译
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations. This form of "black-box" classifier training has become increasingly important as the cost of training and inference in large-scale LMs grows. But existing black-box LM classifier learning approaches are themselves computationally inefficient, typically specializing LMs to the target task by searching in a large space of (discrete or continuous) prompts using zeroth-order optimization methods. Instead of directly optimizing in prompt space, PromptBoosting obtains a small pool of prompts via a gradient-free approach and then constructs a large pool of weak learners by pairing these prompts with different elements of the LM's output distribution. These weak learners are then ensembled using the AdaBoost algorithm. The entire learning process requires only a small number of forward passes and no backward pass. Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
translated by 谷歌翻译
Robustness evaluation against adversarial examples has become increasingly important to unveil the trustworthiness of the prevailing deep models in natural language processing (NLP). However, in contrast to the computer vision domain where the first-order projected gradient descent (PGD) is used as the benchmark approach to generate adversarial examples for robustness evaluation, there lacks a principled first-order gradient-based robustness evaluation framework in NLP. The emerging optimization challenges lie in 1) the discrete nature of textual inputs together with the strong coupling between the perturbation location and the actual content, and 2) the additional constraint that the perturbed text should be fluent and achieve a low perplexity under a language model. These challenges make the development of PGD-like NLP attacks difficult. To bridge the gap, we propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP. Specifically, we address the aforementioned challenges in a unified optimization framework. And we develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables and leverage an effective sampling method to establish an accurate mapping from the continuous optimization variables to the discrete textual perturbations. Moreover, as a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models. Extensive experiments are provided to demonstrate the effectiveness of TextGrad not only in attack generation for robustness evaluation but also in adversarial defense.
translated by 谷歌翻译